The actual infection counts are estimated from fatality data instead of the more biased (or even meaningless) positive test data. The only input to the model is the time to death distribution - the distribution of the time between infection and death. The approach is simple; using the time to death distribution in reverse, each death is randomly assigned to an infection day. The number of people which were estimated to be infected on that date (adjusted for censoring) multiplied by the infection fatality rate (IFR) is an estimate of the true infection count on that day. Adjusting for censoring means that a correction is made to account for people that were infected on previous day, but not enough time has elapsed to know if they will survive.
Based on [REFs], I am using a shifted negative binomial (mean=23.9, size=7.9) distribution for time to death (given death from COVID-19):
Equation here
The number of deaths on each day are randomly distributed to previous days based on the probability that infection took place on that day. The figure below illustrates how this works. For example, in the United States there were 1988 deaths recorded on Apr 13. This number shows up on the top right in the row corresponding to Apr 13. These 1988 deaths are then distributed to previous days according to the time-to-death distribution. For example, we can see from the PMF plot above that about 4.46% of the deaths of any day will be assigned 20 days in the past. Thus, we would expect about 89 deaths to be assigned to Mar 24 which is 20 days before Apr 13. The figure shows results for one simulation which assigned 81 81 deaths to Mar 24. This indicates that 81 of the people who died on Apr 13 were estimated to be infected on Mar 24.
Likewise for Apr 12, 74 of the 1996 people who died were estimated to have been infected on Mar 24. Adding all the numbers in the Mar 24 column, we find that a total of 830 of the people who have died were estimated to have been infected on Mar 24.
Before we can use this number to estimate the infection fatality rate (IFR) we have to make an adjustment to account for the people that were infected on Mar 24 but not enough time has elapsed to know if they will survive. Referring to the CDF (cumulative distribution function) plot above, we expect 39.9% of the deaths from COVID-19 to occur within in the first 20 days from infection. This implies that the 830 imputed infections only represents about 39.9% of the infections that we can expect to be attributed to Mar 24 once more time elapses. Dividing 830 by 39.9% to adjust for future deaths, we get an estimate of 2078.12 for the number of people infected on Mar 24 who will unfortunately succumb to COVID-19.
Taking that number and dividing by the estimated IFR provides an estimate of the total number of people infected on Mar 24:| fatality rate | 3.0% | 2.0% | 1.0% | 0.5% |
| number infected | 69,271 | 103,906 | 207,812 | 415,624 |
Repeating this procedure for every day will give the estimated infection rates over time. Estimated infection counts at days close to the current day will have a high level of uncertainty because there is limited fataility data available. To help control the erratic behavior, the estimates are adjust slightly to encourage the log of the estimated infection counts to be linear. This has no pratical effect on the estimates more than about 10 days from the current date.
The estimated infection count plots on the main page also include uncertainty bands (confidence intervals) formed by repeating this procedure 1000 times and shading in the 95% pointwise intervals (i.e., using the .025 and .975 quantiles).
TODO
TODO Details of the team